A neural network speech recognizer based on the both acoustic steady portions and transitions
نویسنده
چکیده
Previous works on speech recognition utilizing neural networks have often relied on either recognition through segmentation or mapping of the representation trajectories to the phoneme space. Here, information could be missed due to the manner of border labeling techniques. Recent works have indicated that firstly, phonetic borders and transitions would have a good potential to be recognized as acoustic units, and secondly, recognition of the fast transitions by neural networks, as fixed cues in time, results in high performance detection and recognition of those events. This approach was manifested through recognition of basic units formed from the VC and CV borders in Farsi (Persian) spoken language. Analysis of the resulting errors has indicated certain discrepancies amongst the theoretical linguistic points of view and implementation outcome. Implementation results have indicated that the CV, CVC and CVCC linguistic models for Farsi syllables do not always match the reality of the acoustic space in the speech signal.
منابع مشابه
Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods
Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...
متن کاملشبکه عصبی پیچشی با پنجرههای قابل تطبیق برای بازشناسی گفتار
Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...
متن کاملSpeech Emotion Recognition Using Scalogram Based Deep Structure
Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...
متن کاملAn Intonational Phrase Boundary and Pitch Accent Dependent Speech Recognizer
Does prosody help word recognition? In this paper, we propose a novel probabilistic framework in which word and phoneme are dependent on prosody in a way that improves word recognition. We describe the idea of prosody dependent speech recognition by building a prosody dependent speech recognizer that conditions word and phoneme models on two important prosodic variables: intonational phrase bou...
متن کاملSubmitted to Eurospeech’99, Budapest SPEECH/MUSIC DISCRIMINATION BASED ON POSTERIOR PROBABILITY FEATURES
A hybrid connectionist-HMM speech recognizer uses a neural network acoustic classifier. This network estimates the posterior probability that the acoustic feature vectors at the current time step should be labelled as each of around 50 phone classes. We sought to exploit informal observations of the distinctions in this posterior domain between nonspeech audio and speech segments well-modeled b...
متن کامل